Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Proof of concept using Loki to store and retrieve logs #1540

Open
wants to merge 13 commits into
base: edu/metal-lb
Choose a base branch
from

Conversation

hydrogen18
Copy link
Contributor

This is my proof of concept for using Loki to store & retrieve logs. This is based off my work I did on the metal-lb branch, cleaning up some of the way we interact with operators

Changes

  1. Add loki to the development environment, running it within kind
  2. Add code to query loki's HTTP interface
  3. Rework the lease logs command to query loki

Each namespace in Kubernetes becomes an "organization" in Loki. So logs are automatically kept isolated between deployments. Promtail picks up the logs automatically and forwards them to Loki. Accessing this is pretty simple. I wrote a client to make the HTTP request to Loki. Trying to import their client is not very practical and making the actual requests is not complex. I copied some code from Loki, which uses the same license as our project anyways.

Selecting loki and supporting it as our only solution makes sense for now. It is an open source project under active development designed to work with kubernetes. It would be nice to support other log retention solutions but I don't think that is practical for now.

One neat feature I've added is the ability to identify and limit the logs to a specific run of a container. This happens if the container fails & restarts. I think this is helpful because in the case of a container that deploys but fails quickly an end user can request all the logs from a specific run to try and figure out why it is failing. It's still on an end user to deploy a container that logs helpful information, but this should makes things easier.

One thing I haven't tried to account for is running Loki should be optional. If a provider wants to host lots of computational workloads (like miners, or whatever) they shouldn't be required to run Loki. So we need to figure out what to do in that case. Should we just fall back to querying the kube logs?

Incomplete / Unresolved

  1. Providers need to be configured to limit retention of logs to whatever they have storage for
  2. Providers need to configure Loki to rate limit logs to something reasonable. This is an ongoing effort in Loki, hopefully we can pick this up once the changes are merged into a new release.
  3. When a lease is closed, we need to signal Loki to clear out all the logs it has retained for that lease. Otherwise Loki could run out of space retaining logs for closed leases.

@hydrogen18 hydrogen18 requested review from boz and troian as code owners March 14, 2022 16:31
@github-actions
Copy link

Marked as stale; will be closed in five days.

Cut bait or go fishing!

@github-actions github-actions bot added the stale label Mar 25, 2022
@boz boz added keepalive Exempt these when managing stale PRs and removed stale labels Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Exempt these when managing stale PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants